Extensions of Attribute Grammars for Structured Document Queries
نویسنده
چکیده
Widely-used document speciication languages like, e.g., SGML and XML, model documents using extended context-free grammars. These diier from standard context-free grammars in that they allow arbitrary regular expressions on the right-hand side of productions. To query such documents, we introduce a new form of attribute grammars (extended AGs) that work directly over extended context-free grammars rather than over standard context-free grammars. Viewed as a query language, extended AGs are particularly relevant as they can take into account the inherent order of the children of a node in a document. We show that two key properties of standard attribute grammars carry over to extended AGs: eeciency of evaluation and decidability of well-deenedness. We further characterize the expressiveness of extended AGs in terms of monadic second-order logic, establish the complexity of their non-emptiness and equivalence problem to be complete for EXPTIME, and consider several extensions of extended AGs. As an application we show that the Region Algebra expressions introduced by Consens and Milo can be eeciently translated into extended AGs. This translation drastically improves the known upper bound on the complexity of the emptiness and equivalence test for Region Algebra expressions.
منابع مشابه
Attribute grammars for unranked trees as a query language for structured documents
Document specification languages, like for instance XML, model documents using extended context-free grammars. These differ from standard context-free grammars in that they allow arbitrary regular expressions on the right-hand side of productions. To query such documents, we introduce a new form of attribute grammars (extended AGs) that work directly over extended context-free grammars rather t...
متن کاملInformation Retrieval from Structured Documents Represented by Attribute Grammars
This paper presents a system for Information Retrieval (IR) from collections of structured documents represented by Attribute Grammars (AG). Each document corresponds to a syntactic tree with nodes decorated with sets of attributes. The values of these attributes correspond to characteristics which specify the semantics of the textual content and the structure in order to perform IR. First, we ...
متن کاملSIMON: A Grammar-based Transformation System for Structured Documents
SIMON is a grammar-based transformation system for restructuring documents. Its target applications include meta-level specification of document assembly, view definition and retrieval for multiview documents, and document type evolution. The internal document model is based on attribute grammars,and it interfaces with externaldocumentmodels such as SGML through input and output conversion. The...
متن کاملSQL-AG: Querying structured documents using attribute grammars
Structured documents, such as program source texts, technical documentation, or XML data, comprise an important class of data in many applications. Structured documents are distinguished from flat text by their tree structure. In a program source text, this structure is the abstract syntax tree of the program. In a technical document, this structure is the division in chapters, sections, paragr...
متن کاملUsing Attribute Grammars to Uniformly Represent Structured Documents - Application to Information Retrieval
This paper presents an ongoing work to uniformly represent structured documents by mean of Attribute Grammars (AG). Each document corresponds to a syntactic tree with nodes decorated with sets of attributes. The values of these attributes correspond to characteristics which specify the semantics of both the textual content and the structural elements. We show how to use this representation for ...
متن کامل